GPU Acceleration for the C++ Standard Template Library
نویسنده
چکیده
Modern programmers must exploit parallelism for performance gains, possibly through the use of an attached or on-chip GPU. To take advantage of the GPU in C++ programs, the programmer must use either a new language (CUDA or OpenCL) or an external library (Thrust). Rather than requiring that programmers learn new tools, modify existing code, and change software development practices, the C++ Standard Template Library (STL) can be modified to automatically accelerate common algorithms using the GPU. This paper presents a GPU accelerated version of the C++ STL, libcxxgpu. Using the thrust library, function calls to the algorithms provided by the C++ STL are executed on the GPU, depending on a set of heuristics that determine when to use the CPU and when to use the GPU. In this paper, we detail the implementation of the accelerated library, highlight challenges encountered, and analyze the performance factors that determine which device should be used. CR Categories: D.1.3 [Programming Techniques]: Concurrent Programming—Parallel Programming D.2.2 [Software Engineering]: Design Tools and Techniques—Software Libraries
منابع مشابه
Thrust : A Productivity - Oriented Library for CUDA 26
This chapter demonstrates how to leverage the Thrust parallel template library to implement high-performance applications with minimal programming effort. Based on the C++ Standard TemplateLibrary (STL), Thrust brings a familiar high-level interface to the realm of GPU Computing whileremaining fully interoperable with the rest of the CUDA software ecosystem. Applications written...
متن کاملMulti-Stage Programming for GPUs in Modern C++ using PACXX
Writing and optimizing programs for high performance on systems with GPUs remains a challenging task even for expert programmers. One promising optimization technique is to evaluate parts of the program upfront on the CPU and embed the computed results in the GPU code allowing for more aggressive compiler optimizations. This technique is known as multi-stage programming and has proven to allow ...
متن کاملAccelerating high-order WENO schemes using two heterogeneous GPUs
A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...
متن کاملGaussian Process Models with Parallelization and GPU acceleration
In this work, we present an extension of Gaussian process (GP) models with sophisticated parallelization and GPU acceleration. The parallelization scheme arises naturally from the modular computational structure w.r.t. datapoints in the sparse Gaussian process formulation. Additionally, the computational bottleneck is implemented with GPU acceleration for further speed up. Combining both techni...
متن کاملAccelerating QDP++ using GPUs
Graphic Processing Units (GPUs) are getting increasingly important as target architectures in scientific High Performance Computing (HPC). NVIDIA established CUDA as a parallel computing architecture controlling and making use of the compute power of their GPUs. CUDA provides sufficient support for C++ language elements to enable the Expression Template (ET) technique in the device memory domai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012